AsterixDB: A Scalable, Open Source BDMS

نویسندگان

  • Sattam Alsubaiee
  • Yasser Altowim
  • Hotham Altwaijry
  • Alexander Behm
  • Vinayak R. Borkar
  • Yingyi Bu
  • Michael J. Carey
  • Inci Cetindil
  • Madhusudan Cheelangi
  • Khurram Faraaz
  • Eugenia Gabrielova
  • Raman Grover
  • Zachary Heilbron
  • Young-Seok Kim
  • Chen Li
  • Guangqiang Li
  • Ji Mahn Ok
  • Nicola Onose
  • Pouria Pirzadeh
  • Vassilis J. Tsotras
  • Rares Vernica
  • Jian Wen
  • Till Westmann
چکیده

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today’s open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system’s data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early ”customer” engagements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Fault-Tolerant Data Feeds in AsterixDB

In this paper we describe the support for data feed ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. The need to pe...

متن کامل

A BAD Demonstration: Towards Big Active Data

Nearly all of today’s Big Data systems are passive in nature. We demonstrate our Big Active Data (“BAD”) system, a scalable system that continuously and reliably captures Big Data and facilitates the timely and automatic delivery of new information to a large population of interested users as well as supporting analyses of historical information. We built our BAD project by extending an existin...

متن کامل

Data Ingestion in AsterixDB

In this paper we describe the support for data ingestion in AsterixDB, an open-source Big Data Management System (BDMS) that provides a platform for storage and analysis of large volumes of semi-structured data. Data feeds are a new mechanism for having continuous data arrive into a BDMS from external sources and incrementally populate a persisted dataset and associated indexes. We add a new BD...

متن کامل

Storage Management in AsterixDB

Social networks, online communities, mobile devices, and instant messaging applications generate complex, unstructured data at a high rate, resulting in large volumes of data. This poses new challenges for data management systems that aim to ingest, store, index, and analyze such data efficiently. In response, we released the first public version of AsterixDB, an open-source Big Data Management...

متن کامل

Supporting Similarity Queries in Apache AsterixDB

Many applications require similarity query processing. Most existing work took an algorithmic approach, developing indexing structures, algorithms, and/or various optimizations. In this work, we choose to take a different, systems-oriented approach. We describe the support for similarity queries in Apache AsterixDB, a parallel, open-source Big Data management system for NoSQL data. We describe ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2014